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' i , Abstract 

, The memory model of a shared-memory multiprocessor is a contract 

between the designer and programmer of the multiprocessor. The sequen- 
tial consistency memory model specifies a total order among the memory 
^ . (read and write) events performed at each processor. A trace of a memory 

' system satisfies sequential consistency if there exists a total order of all 

I | memory events in the trace that is both consistent with the total order at 

f^*) ■ each processor and has the property that every read event to a location 

00 ' returns the value of the last write to that location. 

<^> Descriptions of shared-memory systems are typically parameterized 

■ by the number of processors, the number of memory locations, and the 

number of data values, ft has been shown that even for finite param- 
C/3 , eter values, verifying sequential consistency on general shared-memory 

systems is undecidable. We observe that, in practice, shared-memory sys- 
| tems satisfy the properties of causality and data independence. Causality 

• rH ■ is the property that values of read events flow from values of write events. 

' Data independence is the property that all traces can be generated by 

, renaming data values from traces where the written values are distinct 

from each other. If a causal and data independent system also has the 
property that the logical order of write events to each location is identical 
to their temporal order, then sequential consistency can be verified algo- 
rithmically. Specifically, we present a model checking algorithm to verify 
sequential consistency on such systems for a finite number of processors 
and memory locations and an arbitrary number of data values. 



1 Introduction 

Shared-memory multiprocessors are very complex computer systems. Multi- 
threaded programs running on shared-memory multiprocessors use an abstract 
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view of the shared memory that is specified by a memory model. Examples 
of memory models for multiprocessors include sequential consist ency |Lam79 |, 
partial store ordering [WG99|, and the Alpha memory model ]Com98t . The 
implementation of the memory model, achieved by a protocol running cither 
in hardware or software, is one of the most complex aspects of multiprocessor 
design. These protocols are commonly referred to as cache-coherence protocols. 
Since parallel programs running on such systems rely on the memory model for 
their correctness, it is important to implement the protocols correctly. However, 
since efficiency is important for the commercial viability of these systems, the 
protocols are heavily optimized, making them prone to design errors. Formal 
verification of cache-coherence protocols can detect these errors effectively. 

Descriptions of cache-coherence protocols are typically parameterized by the 
number of processors, the number of memory locations, and the number of data 
values. Verifying parameterized systems for arbitrary values of these parame- 
ters is undecidable for nontrivial systems. Interactive theorem proving is one 
approach to parameterized verification. This approach is not automated and is 
typically expensive in terms of the required human effort. Another approach is 
to model check a parameterized system for small values of the parameters. This 
is a good debugging technique that can find a number of errors prior to the more 
time-consuming effort of verification for arbitrary parameter values. In this pa- 
per, we present an automatic method based on model checking to verify that a 
cache- coherence protocol with fixed parameter values is correct with respect to 
the sequential consistency memory model. 

The sequential consistency memory model [Lam79 specifies a total order 
among the memory events (reads and writes) performed locally at each proces- 
sor. This total order at a processor is the order in which memory events occur 
at that processor. A trace of a memory system satisfies sequential consistency 
if there exists a total order of all memory events that is both consistent with 
the local total order at each processor, and has the property that every read to 
a location returns the latest (according to the total order) value written to that 
location. Surprisingly, verifying sequential consistency, even for fixed parame- 
ter values, is undecidable [ AMP96| . Intuitively, this is because the witness total 
order could be quite different from the global temporal order of events for some 
systems. An event might need to be logically ordered after an event that occurs 
much later in a run. Hence any algorithm needs to keep track of a potentially 
unbounded history of a run. 

In this paper, we consider the problem of verifying that a shared- memory 
system S(n, m, v) with n processors, m locations and v data values is sequen- 
tially consistent. We present a method that can check sequential consistency for 
any fixed n and m and for arbitrary v. The correctness of our method depends 
on two assumptions — causality and data independence. The property of causal- 
ity arises from the observation that protocols do not conjure up data values; 
data is injected into the system by the initial values stored in the memory and 
by the writes performed by the processors. Therefore every read operation r to 
location I is associated with cither the initial value of I or some write operation 
w to I that wrote the value read by r. The property of data independence arises 
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from the observation that protocols do not examine data values; they just for- 
ward the data from one component of the system (cache or memory) to another. 
Since protocol behavior is not affected by the data values, we can restrict our 
attention, without loss of generality, to unambiguous runs in which the writ- 
ten data values to a location are distinct from each other and from the initial 
value. We have observed that these two assumptions are true of shared-memory 



systems that occur in practice [LLG+90, KOH+94, BDH+99, BGM+00] 



For a causal and unambiguous run, we can deduce the association between 
a read and the associated write just by looking at their data values. This leads 
to a vast simplification in the task of specifying the witness total order for 
sequential consistency. It suffices to specify for each location, a total order on 
the writes to that location. By virtue of the association of write events and 
read events, the total order on the write events can be extended to a partial 
order on all memory events (both reads and writes) to that location. If a 
read event r reads the value written by the write event uu, the partial order 
puts r after w and all write events preceding w, and before all write events 
succeeding w. As described before, sequential consistency specifies a total order 
on the memory events for each processor. Thus, there are n total orders, one 
for each processor, and m partial orders, one for each location, imposed on the 
graph of memory events of a run. A necessary and sufficient condition for the 
run to be sequentially consistent is that this graph is acyclic. We further show 
that existence of a cycle in this graph implies the existence of a nice cycle in 
which no two processor edges (imposed by the memory model) are for the same 
processor and no two location edges (imposed by the write order) are for the 
same location. This implies that a nice cycle can have at most 2 x min({n, m}) 
edges; we call a nice cycle with 2 x k edges a fc-nice cycle. Further if the memory 
system is symmetric with respect to processor and location ids, then processor 
and location edges occur in a certain canonical order in the nice cycle. These 
two observations drastically reduce the number of cycles for any search. 

We finally argue that a number of causal and data independent shared- 
memory systems occurring in practice also have the property that the witness 
write order at each location is simply the temporal order of the write events. In 
other words, a write event w is ordered before w' if w occurs before to'. We call 
this a simple write order, and it is in fact the correct witness for a number of 
shared-memory systems. For cache-based shared-memory systems, the intuitive 
explanation is that at any time there is at most one cache with write privilege 
to a location. The write privilege moves from one cache to another with time. 
Hence, the logical timestamps [Lam7£] of the writes to a location order them 
exactly according to their global temporal order. We show that the proof that 
a simple write order is a correct witness for a memory system can be performed 
by model checking ]CE81 , |QS81 ] . Specifically, the proof for the memory system 
S(n,m,v) for fixed n and m and arbitrary v is broken into min({n,m}) model 
checking lemmas, where the fc-th lemma checks for the existence of canonical 
fc-nice cycles. 

The rest of the paper is organized as follows. Sections ^ and || formalize 
shared-memory systems and our assumptions of causality and data indepen- 
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dence about them. Section [I] defines the sequential consistency memory model. 
Section || defines the notions of a witness and a constraint graph for an unam- 
biguous and causal run. Section [6] and |^ show that it is sufficient to search for 
canonical nice cycles in the constraint graph. Section |8| shows how to use model 
checking to detect canonical nice cycles in the constraint graphs of the runs of 
a memory system. Finally, we discuss related work in Section || and conclude 
in Section [II]. 

2 Shared-memory systems 

Let N denote the set of positive integers and W denote the set of non-negative 
integers. For any n > 1, let N„ denote the set of positive integers up to n and 
W n denote the set of non-negative integers up to n. 

A memory system is parameterized by the number of processors, the number 
of memory locations, and the number of data values. The value is used to 
model the initial value of all memory locations. In a memory system with 
n processors, to memory locations, and v data values, read and write events 
denoted by R and W can occur at any processor in N„, to any location in N m , 
and have any data value in W„ (the data values in N„ together with the initial 
value 0). Formally, we define the following sets of events parameterized by the 
number of processors n, the number of locations to, and the number of data 
values v, where n, m, v > 1. 

1. E r (n,m,v) = {R} x N n x N m x W„ is the set of read events. 

2. E w (n, m,v) — { W} xN„x N m x W„ is the set of write events. 

3. E(n, to, v) — E r (n, to, v) U E w (n, to, v) is the set of memory events. 

4. E a (n,m,v) D E(n,m,v) is the set of all events. 

5. E a (n,m,v) \ E(n,m,v) is the set of internal events. 

The set of all finite sequences of events in E a (n, m, v) is denoted by E a (n 7 to, v)* . 
A memory system S(n, to, v) is a regular subset of E a (n, to, v)* . A sequence a S 
S(n, to, v) is said to be a run. We denote by S(n, m) the union Uu>i m i v )- 
Consider any a 6 E a (n, to, v)* . We denote the length of a by ~\a\ and write 
a(i) for the i-th element of the sequence. The set of indices of the memory 
events in a is denoted by dom(a) — {1 < k < \a\ \ a(k) € E(n, to, v)}. For every 
memory event e = (a,b,c,d) £ E(n,m,v), we define op(e) — a, proc(e) = b, 
loc(e) = c, and data(e) = d. The set of memory events by processor i for 
all 1 < i < n is denoted by P(o-,i) = {k € dom(a) \ proc(a(k)) = i}. The set 
of memory events to location i for all 1 < i < m is denoted by L(<t, i) = {k S 
dom(a) | loc(a(k)) = i}. For all 1 < i < m, the set of write events to location 
i is denoted by L w (a,i) = L(a,i) \ op(cr(k)) = W}, and the set of read 

events to location i is denoted by L r (a,i) — {k G L(a,i) \ op(a(k)) = R}. 
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typedef Msg {m : {ACKS, ACKX}, a : N m , d : W v } U {m : {INVAL}, a : N m }; 
typedef CacheEntry {d :W v ,s: {INV, SHD, EXC}}; 
cache : array N„ of array N TO of CacheEntry; 
inQ : array N n of Queue(Msg); 
owner : array N m of W n ; 

Initial predicate 

Vi G N n ,j G N m : (cache[i][j] = (0, SHD) A in Q[i]. is Empty A owner[j] ^ 0) 
Events 

(R,i,j, k) cache[i]\j].s 7^ INV A cache[i][j].d = k — > 
(VK,i,j,fc) cac/ie[i][7'].s = £XC^ 

cocfte[z][j].d := fc 
(ACKX,i, j) cache\i][j].s ^ EXC A owner[j] ^ -> 

i/ owner[j] ^ i then cache[owner[j]][j].s := INV; 
owner [j] := 0; 
for each (p e N n ) 
i/ (p = i) then 

inQ[p] := append (inQ[p\, (ACKX, j, cac/ie[oOTzer[7']][7'].d)) 
eZse i/ (p ^ owner [7] A cache[p][j].s ^ INV) then 
inQ[p] := append (inQ [p], {INVAL, j)) 
(ACKS, i,j) cache[i][j].s = INV A owner[j] ^ -» 
cac/ie[owner[7']][j].s := 5ffD; 
owner [7] := 0; 

mQ[j] := append (inQ\i), (ACKS , j, cache[owner[j]][j].d)); 
(UPD,i) -^isEmpty(inQ\i]) 

Zet msj = ftead(mQ[i]) in 
i/ (msg.m = INVAL) then 

cache[i][msg.a].s := INV 
else if (msg.m = ACKS) then { 
cache[i][msg.a] := (SHD,msg.d); 
owner[msg.a] := i 
} else { 

cache[i][msg.a] :— (EXC , msg.d); 
owner[msg.a] := i 

} 

inQ[i] := tail(inQ[i\) 

Figure 1: Example of memory system 
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The subsequence obtained by projecting a onto dom(a) is denoted by a. If 
a £ S(n, m, v), the sequence a is a trace of S(n, m, v). Similarly, if a € S(n, m), 
the sequence a is a trace of S(n, m). 

Example. Consider the memory system in Figure |l|. It is a highly sim- 
plified model of the protocol used to maintain cache coherence within a single 
node in the Piranha chip multiprocessor system BGM + 00[ |. The system has 
three variables — cache, inQ and owner — and five events — the memory events 
{R, W} and the internal events {ACKX, ACKS, UPD). The variables inQ and 
owner need some explanation. For each processor i, there is an input queue 
inQ[i] where incoming messages are put. The type of mQ[i] is Queue. The 
operations isEmpty, head and tail are defined on Queue, and the operation 
append is defined on Queue x Msg. They have the obvious meanings and their 
definitions have been omitted in the figure. For each memory location j, either 
owner[j] = or owner[j] contains the index of a processor. Each event is associ- 
ated with a guarded command. The memory events R and W are parameterized 
by three parameters — processor i, location j and data value k. The internal 
events ACKX and ACKS are parameterized by two parameters — processor i 
and location j. The internal event UPD is parameterized by processor i. A 
state is a valuation to the variables. An initial state is a state that satisfies 
the initial predicate. An event is enabled in a state if the guard of its guarded 
command is true in the state. The variables are initialized to an initial state 
and updated by nondeterministically choosing an enabled event and executing 
the guarded command corresponding to it. A run of the system is any finite 
sequence of events that can be executed starting from some initial state. 

A processor i can perform a read to location j if cache[i] [j].s G {SHD, EXC}, 
otherwise it requests owner [j] for shared access to location j. The processor 
owner[j] is the last one to have received shared or exclusive access to location 
j. The request by i has been abstracted away but the response of owner[j] is 
modeled by the action J 4Ci ; sT«S , [i][j], which sends a ACKS message containing the 
data in location j to i and temporarily sets owner [j] to 0. Similarly processor 
i can perform a write to location j if cac/ie[i][j].s = EXC, otherwise it requests 
owner [j] for exclusive access to location j. The processor owner[j] responds 
by sending a ACKX message to i and INVAL messages to all other processors 
that have a valid copy of location j. owner[j] is set to i when processor i reads 
the ACKS or ACKX message from inQ[i] in the event UPD[i]. Note that new 
requests for j are blocked while owner \j] — 0. A processor i that receives an 
INVAL message for location j sets cache[i][j].s to INV. m 



3 Causality and data independence 

In this section, we will state our main assumptions on memory systems - 
causality and data independence. 

We assume that the runs of memory systems are causal. That is, every read 
event to location m "reads" either the initial value of m or the value "written" to 
m by some write event. We believe that this assumption is reasonable because 
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memory systems do not conjure up data values; they just move around data 
values that were introduced by initial values or write events. We state the 
causality assumption formally as follows. 

Assumption 1 (Causality) For all n,m,v > 1, for all traces r of S(n, m, v), 
and for all locations 1 < i < m, if x 6 L r (r, i), then either data{r(x)) — or 
there is y 6 L w (r,i) such that data(r(x)) = data{r{y)). 

The memory system in Figure [l] is causal. Only the write event W introduces 
a fresh data value in the system by updating the cache; the internal events 
ACKS, ACKX and UPD move data around and the read event R reads the 
data present in the cache. Therefore, the data value of a read operation must 
either be the initial value or the value introduced by a write event, thus 
satisfying Assumption [|. 

Memory systems occurring in practice also have the property of data inde- 
pendence, that is, control decisions are oblivious to the data values. A cache 
line carries along with the actual program data a few state bits for recording 
whether it is in shared, exclusive or invalid mode. Typically, actions do not 
depend on the value of the data in the cache line. This can be observed, for 
example, in the memory system shown in Figure [l]. Note that there are no 
predicates involving the data fields of the cache lines and the messages in any 
of the internal events of the system. In such systems, renaming the data values 
of a run results in yet another run of the system. Moreover, every run can 
be obtained by data value renaming from some run in which the initial value 
and values of write events to any location i are all distinct from each other. In 
order to define data independence formally, we define below the notion of an 
unambiguous run and the notion of data value renaming. 

Formally, a run a of S(n, m, v) is unambiguous if for all 1 < i < m and 
x E L w (cr,i), we have (1) data(a(x)) ^ 0, and (2) data(a(x)) ^ data(a(y)) for 
all y G L w (a, i) \ {x}. In an unambiguous run, every write event to a location 
i has a value distinct from the initial value of i and the value of every other 
write to i. The trace a corresponding to an unambiguous run a is called an 
unambiguous trace. If a run is both unambiguous and causal, each read event 
to location i with data value reads the initial value of i, and each read event 
with a nonzero data value reads the value written by the unique write event 
with a matching data value. Thus, a read event can be paired with its source 
write event just by comparing data values. 

A function A : N m x W — > W is called a renaming function if A(j, 0) = for 
all 1 < J < m. Intuitively, the function A provides for each memory location 
c and data value d the renamed data value A(c, d). Since models the fixed 
initial value of all locations, the function A does not rename the value 0. Let \ d 
be a function on E(n, m, v) such that for all e = (a, 6, c, d) £ E(n, m), we have 
A d (e) = (a, b, c, A(c, d)). The function X d is extended to sequences in E(n, m, v)* 
in the natural way. 

We state the data independence assumption formally as follows. 
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Assumption 2 (Data independence) For all n,m,v > 1 and sequences t £ 
E(n,m,v)* , we have that r is a trace ofS(n,m,v) iff there is an unambiguous 
trace r' of S(n,m) and a renaming function A : N m x W — > W„ such that 
t = A d (r'). 

Assumptions [I] and ^| are motivated by the data handling in typical cache- 
coherence protocols. We can have these assumptions be true on protocol de- 
scriptions by imposi ng rest rictions on the operations allowed on variables that 



contain data values Nal99 . For example, one restriction can be that no data 
variable appears in the guard expression of an internal event or in the control 
expression of a conditonal. 



4 Sequential consistency 

Suppose S(n,m,v) is a memory system for some n,m,v > 1. The sequential 



consistency memory model [Lam79] is a correctness requirement on the runs of 
S(n,m,v). In this section, we define sequential consistency formally. 

We first define the simpler notion of a sequence being serial. For all r £ 
E(n,m,v)* and 1 < i < |r|, let upto(r,i) be the set {1 < k < i | op(r(k)) = 
W A loc(r(k)) = loc(r(i))}. In other words, the set upto(r,i) is the set of 
write events in r to location loc(r(i)) occurring not later than i. A sequence 
r £ E(n, rn, v)* is serial if for all 1 < u < \t\, we have 

data(r(u)) = 0, if upto(r,u) = 

data(r(u)) — data(T(max(upto(T,u)))), if upto(r,u) ^0. 

Thus, a sequence is serial if every read to a location i returns the value of the 
latest write to i if one exists, and the initial value otherwise.^] 

The sequential consistency memory model M is a function that maps every 
sequence of memory events r £ E(n,m,v)* and processor 1 < i < n to a 
total order M(r, i) on P(r,i) defined as follows: for all u, v £ P(r,i), we have 
(it, v) £ M(r,i) iff u < v. A sequence r is sequentially consistent if there is a 
permutation / on PJi t i such that the following conditions are satisfied. 

CI For all 1 <ti,v < |t| and 1 < i < n, if (u,v) £ M(r,i) then f(u) < f(v). 

C2 The sequence r' = Tf-imTf-1/2) ■ ■ ■ r /~ 1 (l r l) ^ s ser i a l- 

Intuitively, the sequence t' is a permutation of the sequence t such that the 
event at index u in r is moved to index f(u) in r'. According to CI, this per- 
mutation must respect the total order M(t, i) for all 1 < i < n. According 
to C2, the permuted sequence must be serial. A run a £ S(n,m,v) is sequen- 
tially consistent if a satisfies M. The memory system S(n, to, v) is sequentially 
consistent iff every run of S(n, m, v) is sequentially consistent. 

The decision to model the initial values of all locations by the value is implicit in our 
definition of a serial sequence. 
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The memory system in Figure [j] is supposed to be sequentially consistent. 
Here is an example of a sequentially consistent run a of that memory system, 
the corresponding trace r of a, and the sequence t' obtained by permuting t. 

(ACKX, 1,1) 
(UPD,l) 

(W,1AA) 
(#,2,1,0) 
(UPD,2) 
(ACKS, 2,1) 
(UPD,2) 
(#,2,1,1) 

Sequential consistency orders the event r(2) before the event r(3) at processor 
2. Let / be the permutation on N3 defined by /(l) = 2, /(2) = 1, and /(3) = 3. 
The sequence r' is the permutation of r under /. It is easy to check that both 
conditions CI and C2 mentioned above are satisfied. 

In order to prove that a run of a memory system is sequentially consistent, 
one needs to provide a reordering of the memory events of the run. This reorder- 
ing should be serial and should respect the total orders imposed by sequential 
consistency at each processor. Since the memory systems we consider in this 
paper are data independent, we only need to show sequential consistency for 
the unambiguous runs of the memory system. This reduction is stated formally 
in the following theorem. 

Theorem 4.1 For all n, m > 1, every trace of S(n, m) is sequentially consistent 
iff every unambiguous trace of S(n,m) is sequentially consistent. 

Proof: The => case is trivial. 

Let r be a trace of S(n, m, v) for some v > 1. From Assumption [2] there 
is an unambiguous trace t' of S(n, m) and a renaming function A : N m x W — > 
W„ such that r = A d (r'). Since r' is sequentially consistent, we know that 
conditions CI and C2 are satisfied by r'. It is not difficult to see that both 
conditions CI and C2 are satisfied by \ d (r') as well. Therefore r is sequentially 
consistent. ■ 



(W, 1,1,1) 
= a=(R,2,l,0) 
(#,2,1,1) 



(#,2,1,0) 
t> = (W, 1,1,1) 
(#,2,1,1) 



5 Witness 

Theorem |4.l| allows us to prove that a memory system S(n, m, v) is sequentially 
consistent by proving that all unambiguous runs in S(n, m) is sequentially con- 
sistent. In this section, we reduce the problem of checking sequential consistency 
on an unambiguous run to the problem of detecting a cycle in a constraint graph. 

Consider a memory system S(n, m, v) for some fixed n, m, v > 1. A witness 
O for S(n, m, v) maps every trace r of S(n, m, v) and location 1 < i < m to 
a total order fi(r, i) on the set of writes L w (r,i) to location i. If the trace r 
is unambiguous, the total order fl(r,i) on the write events to location i can 
be extended to a partial order f2 e (r, j) on all memory events (including read 
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events) to location i. If a read event r reads the value written by the write 
event w, the partial order puts r after w and all write events preceding w, and 
before all write events succeeding w. Formally, for every unambiguous trace t of 
S(n,m,v), location 1 < i < m, and x,y G L(r,i), we have that (x,y) G fi e (r, i) 
iff one of the following conditions holds. 

1. data(T(x)) = data(T(y)), op(r(x)) — W, and op(r(y)) = R. 

2. data(T{x)) — and data(r{y)) ^ 0. 

3. 3a, b G L w (r,i) such that (a, b) G Q(t,i), data{r{a)) = data(r(x)), and 
data(r(b)) = data(r(y)). 

We now show that the relation f2 e (r, i) is a partial order. First, we need the 
following lemma about f2 e (r, i). 

Lemma 5.1 For all unambiguous traces t of S(n,m,v), locations 1 < i < m 
and r,s,t G L(r,i), if (r,s) G VL e {r,i), then either (r,t) G S1 c (t, i) or (t, s) G 

rr(r,»). 

Proof: Since (r,s) G f2 e (r, i), either data(r(s)) ^ or there is a x G L tu (r, i) 
such that data(r(s)) = data(r(x)). Since r is an unambiguous trace, we 
have that data(r{x)) ^ 0. Therefore, we get that data(r(s)) ^ in both 
cases. If data(r(t)) = we immediately get that (t, s) G ri e (r, i). So sup- 
pose data(r(t)) ^ 0. Since r is unambiguous, there is y G L w (r,i) such 
that data(r{t)) = data(r(y)). We have three cases from the definition of 

(v)e!i e (v)- 

1. data(r(r)) — data(r(s)), op(r(r)) = W, and and op(r(s)) = R. Since 
51 is a total order on L w (r,i), either (r,y) G fi(r, i) or (y, r) G f2(r, i). 
In the first case, we have (r, t) G fi e (r, i). In the second case, we have 
(t,s) G n e (r,i). 

2. data(r(r)) = and dato(r(s)) ^ 0. We get that (r,t) G ^ e (r,i). 

3. 3a, b G L w (r,i) such that (a, 6} G f2(r, i), data(T(a)) — data{r{r)), and 
data(r{b)) = data(r{s)). Since f2 is a total order on L w (t, i), either (a, y) G 
Q(t, i) or (y, a) G 0(r, i). In the first case, we have (r, t) G fi e (r, i). In 
the second case, we have by transitivity (y, b) G 0(t, i) and therefore 
(t,s) G fi e (r,i). 



Lemma 5.2 For aZ/ unambiguous traces r of S(n,m,v) and locations 1 <i < 
m, we have that f2 e (r, i) is a partial order. 

Proof: We show that £l e (r, i) is irreflexive. In other words, for all 1 < x < 
|t|, we have that (x,x) £ £T(t, z). This is an easy proof by contradiction by 
assuming (x,x) G Q e {r, i) and performing a case analysis over the three resulting 
conditions. 
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We show that O e (r, i) is anti-symmetric. In other words, for all 1 < x < y < 
\t\, if (x,y) G Q e (r,i) then (y,x) g" tt e (r,i). We do a proof by contradiction. 
Suppose both (x.y) G fi e (r, i) and (y,x) G f2 e (r, i). We reason as in the proof 
of Lemma |5.1| to obtain data(T(x)) ^ and data{r(y)) ^ 0. Therefore there are 
a, b G L w (r,i) such that data(r{a)) = data{r{x)) and data(r{b)) — data(T(y)). 
We perform the following case analysis. 

1. a = b. Either op{x) — R and op(y) = R, or op(x) = W and op{y) = R, 
or op(x) = R and op(y) = W. In the first case {x,y) f2 e (r, i) and 
(y, x) g" Sl e (r, i). In the second case (y,x) g" f2 e (r, i). In the third case 
(x,y) ^n e (T,i). 

2. (a, 6) G J7(r, i). We have data(r(x)) ^ data(r(y)) since r is unambiguous. 
Since f2(r, i) is a total order, we have (b, a) ^ fi(r, i). Therefore ^ 
n e (r,i). 

3. (6, a) G r2(r, i). This case is symmetric to Case 2. 

Finally, we show that f2 e (r, «) is transitive. Suppose (x,y) £ 51 e (r, i) and 



(y, z) G 51 e (r, j). From Lemma 5.1, either (x, z) G f2 e (r, i) or (z,y) G r2 e (r, i). 



We have shown f2 e (r, i) to be anti-symmetric. Therefore (x,z) G 51 e (r, i). ■ 
5.1 Constraint graph 

Suppose r is an unambiguous trace of S(n,m,v). We have that M(r,i) is 
a total order on P(t, i) for all 1 < i < n from the definition of sequential 
consistency. We also have that fl e (r, j) is a partial order on L(r,j) for all 



1 < j < m from Lemma 5.2. The union of the n total orders M(t,i) and 
m partial orders 6 (t, j) imposes a graph on dom(r). The acyclicity of this 
graph is a necessary and sufficient condition for the trace r to satisfy sequential 
consistency. We define a function G that for every witness f2 returns a function 
G(f2). The function G(Q) maps every unambiguous trace r of S^n, m, v) to the 
graph ( rfom(r) Ui<i<„ M i r i u Ui<j<m ^ 6 ( T > J'))- The work of Gibbons and 



Korach ]GK97t defines a constraint graph on the memory events of a run that 



is similar to G(Q)(t) 

Theorem 5.3 For all n,m,v > 1, every unambiguous trace of S(n,m,v) is 
sequentially consistent iff there is a witness f2 such that the graph G(f2)(r) is 
acyclic for every unambiguous trace r of S(n,m,v). 

Proof: (=>) Suppose r is an unambiguous trace of S(n, m, v). Then r satisfies 
sequential consistency. There is a permutation / on Ni t i such that conditions 
CI and C2 are satisfied. For all 1 < i < m, define Q(t, i) to be the total 
order on L w (r,i) such that for all x,y G L w (r,i), we have (x,y) G fi(r, i) iff 
f{x) < f(y). We show that the permutation / is a linearization of the vertices 
in G(r2)(r) that preserves all the edges. In other words, if (x,y) G M(r,i) for 
some 1 < % < n or (x,y) G il e (r,j) for some 1 < j < m, then f{x) < ,f(y). If 
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(x,y) G M(t,i) then we have from CI that /(x) < f(y). We show below that 
if (x,y)G Q e (r,i) then f(x) < f(y) . 

Let t' = Tf-in) T f- 1 {2) ■ • ■ T / _1 (I T I)' For all 1 < u < |r| we have that 
r(w) = t'(/(w)). We first show for all a G L w {r,j) and x G L(r,j), if 
data(r(a)) = data(T(x)) then /(a) < /(x). Since r is unambiguous, we have 
that data(T(a)) = data{r(x)) ^ 0. Therefore data(r' (f (a))) = data(r' (f (x))) ^ 
0. We have that either op(r'(/(x))) = R or x = a. In the first case /(a) G 
upto(r' , /(x)) which implies that /(a) < /(x), and in the second case /(a) = 
/(x). Therefore f(a) < /(x). 

If (x, y) G J7 e (r, j) then we have three cases. In each case, we show that 
fix) < f(y). 

1. data(r(x)) = data(r(y)), op(r(x)) = W, and op(r(y)) = i?. Since r is un- 
ambiguous data(r(x)) — data(r{y)) ^ 0. We get that data(r'(f(y))) ^ 
which means that upto(r' ', /(y)) ^ and /(x) G upto{r' ', f(y))- Therefore 

m<m- 

2. data(r(x)) = and data(r(y)) ^ 0. Since x =/= y we have /(x) ^ /(?/)• 
Suppose /(y) < /(x). Since data(r(y)) ^ there is 6 G L w (r,j) such 
that data(r(b)) = data(r(y)). Therefore we have that /(&) < /(y) < 
/(x). Therefore the set upto(r', /(x)) 7^ 0. Since t' is unambiguous and 
data{r' (/(x))) = 0we have a contradiction. 

3. 3a, 6 G i t0 (r,j) such that (a, 6) G n(r,j), data(T{a)) — data(r(x)), and 
data(r{b)) = data(r{y)). We show /(x) < /(y) by contradiction. Sup- 
pose /(x) = /(y). Then x = y and data(r(a)) = data(r(b)). Since r 
is unambiguous we get a = b which contradicts (a, 6) G f2(r, j). Sup- 
pose /(y) < /(x). We have that j(a) < f(x) and /(o) < f(y). Since 
(a, 6} G S1(t, j), we have /(a) < f(b) from the definition of Q. Thus we 
have /(a) < f(b) < f(y) < f(x) Therefore /(a) ^ max{upto(r' , /(x))). 
Since r' is unambiguous and data(r' '(/ '(a))) — daia(r'(/(x))) we have a 
contradiction. 

(<=) Suppose r is a trace of S(n, m, v). Then there is a witness f2 such that 
G(£1)(t) is acyclic. Let / be a linearization of the vertices in G(i7)(r) that 
respects all edges. Then CI is satisfied. Let r' denote t/- 1 (i)T/- 1 (2) • ■ ■ t /- 1 (|t|)- 
Then we have that r'(x) = r(/ _1 (x)) for all 1 < x < |r'|. For any 1 < x < |r'|, 
suppose /oc(t'(x)) = j. There are two cases. 

1. data(T'(x)) — 0. We show that upto(r',x) = 0. Consider any ver- 
tex 1 < y < \t'\ such that op(r'(y)) = W and Zoc(r'(y)) = j. Then 
rCT^x)) = and r(/- 1 (y)) ^ 0. Therefore (f- 1 (x)J- 1 (y)) G r! e (r,j) 
and (/^^x),/- 1 ^)) is an edge in G(fi)(r). Therefore /(/^(x)) < 
f(f~ 1 (y)) 01 x < y. Thus we have that upto(r' , x) = 0. 

2. data(r' (x)) ^ 0. We show that upto{r' , x) ^ and if y = max{upto(r' , x)) 
then data(r' (x)) = data{r'(y)). From Assumption El there is a G L w (t' ,j) 
such that data(r'(a)) = data(r' '(x)) and since r' is unambiguous this 
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write is unique. Therefore data(r(f 1 (a))) = data(r(f 1 (x))). Either 
/ _1 (o) = f-\x) or op(r(/- 1 (a;))) = R in which case (/^(o), / -1 0»0) € 
fi e (r, j). In both cases, we have a < x and therefore upto(r',x) ^ 0. 
Consider any vertex 1 < b < \t'\ such that op(r'(b)) = W, loc(r'(b)) = 
j, and a < b. Then (f-\a), f-\b)) e Q(r,j) and (Z" 1 ^), /-*(&)) e 
il e (r,j). Therefore x < b. We thus get a = max{upto{r' ,x)). 



Theorems 4.1 and 5.2 can be combined easily to yield the following theorem. 



Corollary 5.4 For all n,m> 1, every trace of S(n,m) is sequentially consis- 
tent iff there is a witness such that the graph G(Q)(r) is acyclic for every 
unambiguous trace t of S(n,m). 

5.2 Simple witness 

Corollary |5.4| suggests that in order to prove that the memory system S(n, m, v) 
is sequentially consistent, we produce a witness f2 and show for every unambigu- 
ous trace r of S(n,m) that the graph G(fl)(r) is acyclic. But the construction 
of the witness is still left to the verifier. In this section, we argue that a sim- 
ple witness, which orders the write events to a location exactly in the order in 
which they occur, suffices for a number of memory systems occurring in prac- 
tice. Formally, a witness f2 is simple if for all traces t of S(n, to, v) and locations 
1 < i < to, we have (x, y) £ ^(t, i) iff x < y for all x,y € L w (t, i). 

Consider the memory system of Figure [l| We argue informally that a simple 
witness is a good witness for this memory system. Permission to perform writes 
flows from one cache to another by means of the ACKX message. Note that 
for each location j, the variable owner [j] is set to (which is not the id of any 
processor) when an ACKX message is generated. When the ACKX message 
is received at the destination (by the UPD event), the destination moves to 
EXC state and sets owner[j] to the destination id. A new ACKX message is 
generated only when owner [j] =/= 0. Thus, the memory system has the property 
that each memory location can be held in EXC state by at most one cache. 
Moreover, writes to the location j can happen only when the cache has the 
location in EXC state. Therefore, at most one cache can be performing writes 
to a memory location. This indicates that the logical order of the write events 
is the same as their temporal order. In other words, a simple witness is the 
correct witness for demonstrating that a run is sequentially consistent. 

In general, for any memory system in which at any time at most one pro- 
cessor can perform write events to a location, a simple witness is very likely to 



be the correct witness. Most memory systems occurring in practice [LLG + 90 



KOH+94 |BDH+99| , |BGM+00[ have this property. In Section |, we describe a 
model checking algorithm to verify the correctness of a memory system with 
respect to a simple witness. If a simple witness is indeed the desired witness 
and the memory system is designed correctly, then our algorithm will be able 
to verify its correctness. Otherwise, it will produce an error trace suggesting 
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to the verifier that either there is an error in the memory system or the simple 
witness is not the correct witness. Thus our method for checking sequential 
consistency is clearly sound. We have argued that it is also complete on most 
shared- memory systems that occur in practice. 



6 Nice cycle reduction 

For some n, to, v > 1, let S(n,m,v) be a memory system and fi a witness for 



it. Let t be an unambiguous trace of S(n, m, v). Corollary 5A tells us that the 
absence of cycles in the graphs G(0)(r) generated by the unambiguous traces 
of S(n, to) is a necessary and sufficient condition for every trace of S(n, m) to 
be sequentially consistent. In this section, we show that it suffices to detect a 
special class of cycles called nice cycles. In Section ||, we will show that detection 
of nice cycles can be performed by model checking. 

We fix some k > 1 and use the symbol © to denote addition over the additive 
group with elements and identity element k. A k-nice cycle in G(f2)(r) is 
a sequence . . . , Uk, Ufc of distinct vertices in Ni t i such that the following 

conditions are true. 

1. For all 1 < x < k, we have (u Xl v x ) £ M(r,i) for some 1 < i < n and 
(v Xl u x(B i) £ f2 e (r, j) for some 1 < j < m. 

2. For all 1 < x < y < k and for all 1 < i,j < n, if (u x ,v x ) £ M(r,i) and 
(u y , v v ) £ M(tJ) then i j. 

3. For all 1 < x < y < k and for all 1 < i,j < to, if (v x , S Q £ (t, i) and 
(v y ,u y(B1 ) e n e (r, j) then i ^ j. 

In a fc-nice cycle, no two edges belong to the relation M(t, i) for any processor 
i. Similarly, no two edges belong to the relation Jl e (r, j) for any location j. The 
above definition also implies that if a cycle is fc-nice then k < min({n,m}). 

Theorem 6.1 If the graph G(Q)(t) has a cycle, then it has a k-nice cycle for 
some k such that 1 < k < min({n,m}). 

Proof: Suppose G(£!)(t) has no A;-nice cycles but does have a cycle. Consider 
the shortest such cycle u\,.. . ,Ui where I > 1. For this proof, we denote by 
© addition over the additive group with elements N/ and identity element I. 
Then for all 1 < x < I either {u X: u x ^\) € M(r,i) for some 1 < i < n or 
(u Xl u x §i) £ ^ e (r, i) for some 1 < i < to. 

Since the cycle u\, . . . , Ui is not fc-nice for any fc, there are 1 < a < b < I such 
that either (1) (u a , u a ®i) G M(t, i) and (ut, utei) € M(r, i) for some 1 < i < n, 
or (2) (it Q ,u a 0i) G fl e (r,i) and (ub,it{,ei} £ Q. e {r,i) for some 1 < i < m. 

Case (1). We have from the definition of M that u a < u a ®\ and < Ub&i- 
Either u a < or itj, < u a . If u a < uj, then u a < u^i or (u a ,Ub§i) £ M(r,i). 
If Ufc < u a then ui, < u a ei or ( M 6j w aei) £ Af(r, t). In both cases, we have a 
contradiction since the cycle can be made shorter. 
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Case (2). From Lemma 5.1, either (u a ,Ub) £ tt e (r,i) or («&,u ®i) £ tt e (r,i). 



In both cases, we have a contradiction since the cycle can be made shorter. ■ 

7 Symmetry reduction 

Suppose S(n,m,v) is a memory system for some n,m,v > 1. In this section, 
we use symmetry arguments to further reduce the class of cycles that need 
to be detected in constraint graphs. Each fc-nice cycle has 2 x k edges with 
one edge each for k different processors and k different locations. These edges 
can potentially occur in any order yielding a set of isomorphic cycles. But 
if the memory system S(n,m,v) is symmetric with respect to processor and 
memory location ids, presence of any one of the isomorphic nice cycles implies 
the existence of a nice cycle in which the edges are arranged in a canonical order. 
Thus, it suffices to search for a cycle with edges in a canonical order. 



We discuss processor symmetry in Section 7.1 and location symmetry in 



Section 7.2. We combine processor and location symmetry to demonstrate the 



reduction from nice cycles to canonical nice cycles in Section 7.3 



7.1 Processor symmetry 

For any permutation A on N n , the function X p on E(n, m, v) permutes the pro- 
cessor ids of events according to A. Formally, for all e = (a, b, c, d) £ E(n, m, v), 
we define X p (e) — (a,X(b),c,d). The function X p is extended to sequences in 
E(n, m, v)* in the natural way. 

Assumption 3 (Processor symmetry) For every permutation X on N n and 
for all traces r of the memory system S(n,m,v), we have that X p (t) is a trace 
of S(n, m, v). 

We argue informally that the memory system in Figure [l] satisfies Assump- 
tion The operations performed by the various parameterized actions on the 
state variables that store processor ids are symmetric. Suppose s is a state of 
the system. We denote by X p (s) the state obtained by permuting the values 
of variables that store processors ids according to A. Then, for example, if the 
action UPD{i) in some state s yields state t, then the action UPD{X(i)) in state 
X p {s) yields the state X p (t). Thus, from any run a we can construct another run 
X p (a). If a shared- memory system is described with symmetric types, such as 



scalarsets [ID96 , used to model variables containing processor ids, then it has 
the property of processor symmetry by construction. 

The following lemma states that the sequential consistency memory model 
is symmetric with respect to processor ids. It states that two events in a trace 
r ordered by sequential consistency remain ordered under any permutation of 
processor ids. 

Lemma 7.1 Suppose X is a permutation on N n . Suppose r and t' are traces 
of S(n,m,v) such that r' = X p (t). Then for all 1 < x,y < \t\, and for all 
1 < i < n, we have that (x,y) £ M(r,i) iff (x,y) £ M(t',X(i)). 
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Proof: For all 1 < x, y < |r| and for all 1 < i < n, we have that 

{x,y) G M{r,i) 
<^ proc{r{x)) = proc(r(y)) = i and x < y 
<^> proc{r' (x)) — proc(r' (y)) = and x < y 
«- (a;,j/> 6 Af(r',A(z)). 

■ 

The following lemma states that the partial order f2 e obtained from a simple 
witness f2 is symmetric with respect to processor ids. It states that two events to 
location i ordered by tt e (r, i) in a trace r remain ordered under any permutation 
of processor ids. 

Lemma 7.2 Suppose is a simple witness for the memory system S(n,m,v) 
and X is a permutation on N n . Suppose r and r 1 are unambiguous traces of 
S(n,m,v) such that r' = A p (r). Then for all 1 < x,y < |r| and for all 1 < i < 
m, we have that (x,y) G il e (r, i) iff (x,y) G £l e (r',i). 

Proof: We have (a:, y) G r2(r, i) iff x < y iff (x, y) G ^(t', z). From the definition 
of Sl e (r, i) we have the following three cases. 

1. data(r(x)) = data(r(y)), op(r(x)) — W, op(r{y)) — R iff data(r' {x)) = 
data(T'(y)), o P (t'(x)) = W, op(r'(y)) = R. 

2. data(r(x)) = and data(r(y)) ^ iff data{r'{x)) = and data{r'(y)) ^ 0. 

3. 3a, 6 G L w (r,i) such that a < b, data(r(a)) — data(r(x)), data(r(b)) = 
data(r{y)) iff 3a, b G L w (r\i) such that a < b, data(r'(a)) = data(T'(x)), 
data(T'(b)) = data(T'(y)). 



7.2 Location symmetry 

For any permutation A on N m , the function A z on E(n, m, v) permutes the loca- 
tion ids of events according to A. Formally, for all e = (a, b, c, d) G E(n,m,v), 
we define A'(e) = (a,b,X(c),d). The function A' is extended to sequences in 
E(n,m,v)* in the natural way. 

Assumption 4 (Location symmetry) For every permutation A on N m and 
for all traces r of the memory system S(n,m,v), we have that A'(r) is a trace 
of S(n, m, v). 

We can argue informally that the memory system in Figure [I] satisfies As- 
sumption |] also. The operations performed by the various parameterized actions 
on the state variables that store location ids are symmetric. Suppose s is a state 
of the system. We denote by X l (s) the state obtained by permuting the values 
of variables that store location ids according to A. Then, for example, if the 
action UPD{i) in some state s yields state t, then the action UPD{X(i)) in state 
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X 1 (s) yields the state A' (t) . If scalarsets are used for modeling variables contain- 
ing location ids, the shared-memory system will have the property of location 
symmetry by construction. 

The following lemma states that the sequential consistency memory model 
is symmetric with respect to location ids. It states that two events in a trace 
r ordered by sequential consistency remain ordered under any permutation of 
location ids. 

Lemma 7.3 Suppose X is a permutation on N TO . Suppose t and t' are traces 
of S(n,m,v) such that t' = X 1 (t). Then for all 1 < x,y < \t\, and for all 
1 < i < n, we have that (x,y) G M(r,i) iff {x,y) G M(r',i). 

Proof: For all 1 < x, y < |r| and for all 1 < i < m, we have that 

(x,y) G M{r,i) 
4=> proc(r(x)) = proc(r(y)) — i and x < y 

proc(r'(x)) = proc(T'(y)) — i and x < y 
<S> (x, y) G M(r', i). 

m 

The following lemma states that the partial order f2 e obtained from a simple 
witness fi is symmetric with respect to location ids. It states that two events to 
location i ordered by fi e (r, i) in a trace r remain ordered under any permutation 
of location ids. 

Lemma 7.4 Suppose VL is a simple witness for the memory system S(n, to, v) 
and X is a permutation on N m . Suppose t and t' are unambiguous traces of 
S(n,m,v) such that t' — X 1 (t). Then for all 1 < x,y < |r| and for all 1 < i < 
m, we have that (x,y) G £! e (r, i) iff (x,y) G ^ c (t', X(i)). 

Proof: We have (x,y) G fl(r, i) iff x < y iff (x,y) G fi(r',A(i)). From the 
definition of f2 e (r, i) we have the following three cases. 

1. data(r(x)) = data(r(y)), op{r(x)) = W, op(r(y)) = R iff data(T'{x)) = 
data{T'{y)) : op(r'(x)) = W , op( T '(y)) = R. 

2. data(r{x)) = and data{r(y)) ^ iff data{ T ' {x)) = and data(r' {y)) / 0. 

3. 3a, b G L w (r,i) where a < b, data(r(a)) = data(r(x)), and data(r(b)) = 
data{r{y)) iff 3a, b G L w (r',X(i)) where a < b, data{r' (a)) = data{r'{x)), 
and data(T ! (b)) = data(T'(y)). 



7.3 Combining processor and location symmetry 

We fix some k > 1 and use the symbol © to denote addition over the additive 
group with elements Nfe and identity element k. A fc-nice cycle ui, v\, . . . , Uk, Vk 
is canonical if (u x ,v x ) G M(t, x) and (v x , u x ®i) G fl e (r, x © 1) for all 1 < x < 
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k. In other words, the processor edges in a canonical nice cycle are arranged 
in increasing order of processor ids. Similarly, the location edges are arranged 
in increasing order of location ids. The following theorem claims that if the 
constraint graph of a run has a nice cycle then there is some run with a canonical 
nice cycle as well. 

Theorem 7.5 Suppose Q is a simple witness for the memory system S(n, to, v). 
Let t be an unambiguous trace of S(n,m,v). If the graph G(Q,)(r) has a k-nice 
cycle, then there is an unambiguous trace t" of S(n,m,v) such that G(f2)(r") 
has a canonical k-nice cycle. 

Proof: Let u\, v±, . . . , Uk, Ufe be a fc-nice cycle in G(Q,)(t). Let 1 < i\, . . . . ik < n 
and 1 < ji,...,jk < to be such that (u x ,v x ) G M(r,i x ) and (v x ,u x ®i) e 
^ e ( T i ixel) for all 1 < a; < /c. Let a be a permutation on N n that maps i x to 
x for all 1 < x < k. Then from Assumption |3| there is a trace r' of S(n,m,v) 
such that t' = a p (r). Let (3 be a permutation on N m that maps j x to x for all 
1 < x < k. Then from Assumption |J there is a trace r" of S(n, to, v) such that 
t" = I3 1 (t'). For all 1 < x < k, we have that 

(u x ,v x ) E M(r,i x ) 

(u x ,v x ) 6 M(t', a(i x j) = M(t',x) from Lemma 7.1 



{u Xl v x ) <eM(t",x) from Lemma 7.3 

For all 1 < x < k, we also have that 

(v x ,u x qi) G fi e (r, j xe i) 
^ Ua; e i} G il e (r', j x ©i) from Lemma [7.2 



^ («»,tix©i> G ^(^'^(^ei)) = tt e {T",x®l) from Lemma^ 
Therefore Ui,v±, . . . , Ufe, Vk is a canonical fc-nice cycle in G(Q)(t"). 



Finally, Corollary 5.4 and Theorems 3.1 and 7.5 yield the following theorem 



Corollary 7.6 Suppose there is a simple witness f2 such that for all unambigu- 
ous traces t of S(n,m) the graph G(fl)(r) does not have a canonical k-nice 
cycle for all 1 < k < min({n,m}). Then every trace of S(n,m) is sequentially 
consistent. 



8 Model checking memory systems 

Suppose S(n, m, v) is a memory system for some n, m, v > 1. Let f2 be a simple 
witness for S(n,m). In this section, we present a model checking algorithm 
that, given a k such that 1 < k < min({n,m}), determines whether there is a 
trace r in S(n,m) such that the graph G(Q)(t) has a canonical fc-nice cycle. 



Corollary 7.6 then allows us to verify sequential consistency on S(n,m,v) by 
min({n,m}) such model checking lemmas. We fix some k such that 1 < k < 
min({n,m}). We use the symbol © to denote addition over the additive group 
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Automaton Constrain k(j) f° r 1 ^ j ^ k 

States {a, b} 

Initial state a 

Accepting states {a, b} 

Alphabet E(n, m, 2) 

Transitions 

[] ->(op(e) = W A 2oc(e) = j) 

— ► s' — S 

[] s — a A op(e) = W A loc(e) — j A data(e) — 
— * s' — a 

[] s — a A op(e) — A loc(e) — j A data(e) — 1 

-H- S ' = 6 

[] s = b A op(e) — W A Zoc(e) - j A rfaia(e) = 2 
a' = b 



Automaton Constraink(j) for k < j < m 

States {a} 

Initial state a 

Accepting states {a} 

Alphabet E(n,m,2) 

Transitions 

[] ->(op(e) = W A Zoc(e) = j) V data(e) = 
— » s' — s 



Figure 2: Automaton Constrain k(j) 



Automaton Check^ii) 
States {a. b, err} 
Initial state a 
Accepting states {err} 
Alphabet E(n, m, 2) 
Transitions 

[] s — a A proc(e) — i A £oc(e) - i A data(e) £ {1, 2} 
- s' = 6 

[] s = b A proc(e) = i A 2oc(e) - i ® 1 A (data(e) = V (op(e) = IV A data(e) = 1)) 

— * s' — err 
[] otherwise 

— > s' — s 



Figure 3: Automaton Checkk(i) 
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Figure 4: Canonical fc-nice cycle 



with elements and identity element k. The model checking algorithm makes 
use of m automata named Constraint) for 1 < j < m, and k automata named 
Check k(i) for 1 < i < k. We define these automata formally below. 

For all memory locations 1 < j < m, let Constrained) be the regular set 
of sequences in E(n, m, 2) represented by the automaton in Figure |^. The 
automaton Constrained), when composed with S(n, m, v), constrains the write 
events to location j. If 1 < j < k then Constrained) accepts traces where the 
first few (0 or more) write events have data value followed by exactly one 
write with data value 1 followed by (0 or more) write events with data value 2. 
If k < j < m then Constrain^) accepts traces where all writes to location j 
have data value 0. 

For all 1 < i < k, let Checke(i) be the regular set of sequences in E(n, m, 2) 
represented by the automaton in Figure ||. The automaton Checke(i) accepts 
a trace t if there are events x and y at processor i, with x occurring before y, 
such that x is an event to location i with data value 1 or 2 and y is an event to 
location i © 1 with data value or 1 . Moreover, the event y is required to be a 
write event if its data value is 1. 

In order to check for canonical fc-nice cycles, we compose the memory system 
S(n,m,2) with Constraink(j) for all 1 < j < m and with Checkk(i) for all 
1 < i < k and use a model checker to determine if the resulting automaton has 
a run. 

Any accepting run of the composed system has 2 x k events which can be 
arranged as shown in Figure U to yield a canonical fc-nice cycle. Each processor 
i for 1 < i < k and each location j for 1 < j < k supplies 2 events. Each 
event is marked by a 4-tuple denoting the possible values for that event. For 
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example, the 4-tuple ({R, W}, 1, 1, {1, 2}} denotes a read event or a write event 
by processor 1 to location 1 with data value 1 or 2. The edge labeled by M(r, i) 
is due to the total order imposed by sequential consistency on the events at 
processor i. The edge labeled by Sl e (r,j) is due to the partial order imposed by 
the simple witness on the events to location j. For example, consider the edge 
labeled fi e (r, 2) with the source event labeled by ({R, W}, 1, 2, 0) V ( W, 1, 2, 1) 
and the sink event labeled by {{R, W}, 2, 2, {1, 2}). In any run of the composed 
system, the write events to location 2 with value occur before the write event 
with value 1 which occurs before the write events with value 2. Since is a 
simple witness, the partial order f2 e (r, 2) orders all events labeled with before 
all events labeled with 1 or 2. Hence any event denoted by ({R, W}, 1,2,0) 
is ordered before any event denoted by ({R, W}, 2, 2, {1,2}). Moreover, the 
unique write event to location 2 with data value 1 is ordered before any other 
events with value 1 or 2. Hence the event ( W, 1,2,1) is ordered before any event 
denoted by ({R, W}, 2, 2, {1,2}). 

We have given an intuitive argument above that a canonical fc-nice cycle can 
be constructed from any run in the composed system. The following theorem 
proves that it is necessary and sufficient to check that the composed system has 
a run. 

Theorem 8.1 There is a canonical k-nice cycle in G(Q)(t) for some unam- 
biguous trace r of S(n, to) iff there is a trace r' of S(n, m, 2) such that t' G 
Constraink(j) for all 1 < j < to and r' € Checkk(i) for all 1 < i < k. 

Proof: (=>) Suppose there is a canonical fc-nice cycle Ui,vi, . . . ,Uk,Vk in the 
graph G(fl)(r) for some unambiguous trace r of S(n,m). Then (u x ,v x ) G 
M(t, x) and (v x , Mx©i) G ft e (r,x ® 1) for all 1 < x < k. From the definition of 
fl e (r,x), we have that data(r(x)) ^ for all 1 < x < k. Therefore, for all 1 < 
x < fc, there is a unique write event w x such that data(r(w x )) — data(r(u x )). 

For all 1 < j < m, let Vj be the set of data values written by the write events 
to location j in r, and let fj : Vj — > Ni r i be the function such that fj(v) is the 
index of the unique write event to location j with data value v. We define a 
renaming function A : N TO x W — > W2 as follows. For all fc < j < m and v 6 W, 
we have X(j,x) = 0. For all 1 < j < k and v G W, we split the definition into 
two cases. For v G Vj, we have 

= 0, if fj(v) < w-j 

1, if fj(v)=Wj 

2, \ifj(v)> Wj . 

For v ^ Vj, we have 

A(j, v) = 0, if v = 
2, if u ^ 0. 

From Assumption ||, there is a trace r' of S(n,m,2) such that r' = A d (r). In 
t', for every location j such that 1 < j < fc every write event before has 
the data value 0, the write event at Wj has the data value 1, and the write 
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events after wj have the data value 2. Moreover, for every location j such that 
k < j < m every write event has the data value 0. Therefore r 1 G Constrain^) 
for all 1 < i < k. 

We show that t' G Checkk(i) for all 1 < i < k. Since (ui,Vi) G M(r,i), 
we have that Ui < for all 1 < i < fc. We already have that data(r' (ui)) = 
data{r' (wi)) — 1 for all 1 < i < fc. Therefore all we need to show is that for all 
1 < i < k we have data(r' '(vi)) — or op(T'(vi)) = and data(r' '{vi)) = 1. 
Since (v^, G n e (r, i © 1), one of the following conditions hold. 

1. data(T(vi)) = data(r(uiQi)), op{T(vi)) = W, and op(r(u,ei)) = -R- We 
have that op(T'(ui)) = op(r(vi)) = W. Since data(r{vi)) = data(T(ui^i)) 
we have data(r' (vi)) — data(r' '(itj®i)) = 1. Thus, we get op(j'{vi)) = W 
and data(r' (vi)) = 1. 

2. data(r(vi)) — and da£a(r(iti0i)) 7^ 0. From the definition of A, we get 
that data(r' (t'i)) = 0. 

3. 3a G L w (r,i(B 1) such that (a,Wi®i) G Q(r,i©l) and data(T(a)) = 
data(r(vi)). Since (a, 6) G f2(r, i©l) and 51 is a simple witness we get 
a < b. Therefore X(i © 1, data(r(a))) = 0. Thus X(i © 1, data(r(vi))) = 
and data{r' {vi)) = 0. 

Thus, in all cases we have that either data(r' (vi)) — or op{r'(vi)) = W and 
data(r' (vi)) = 1. Therefore r' G Check^{%). 

{<=) Suppose there is a trace r' of S(n,m,2) such that r' G Constrain^{j) 
for all 1 < j < to and r' G Check^(i) for all 1 < z < fc. For all 1 < i < fe, 
let 1 < < Vi < \t'\ be such that the automaton Check ^(i) enters state b 
for the first time on observing r'(ui) and enters state err for the first time 
on observing r'(vi). Therefore we have proc(r'(ui)) = i, Ioc(t' (iti)) — i, and 
data(T r (ui)) G {1,2}. We also have proc(T ; (vi)) — i, loc(T'(vi)) — i © 1, and 
either data(r' (vi)) = or op{r'(vi)) = W and data(r' (vi)) — 1. From Assump- 
tion |^, there is an unambiguous trace r of S(n, m) and a renaming function 
A : N m x W — > W2 such that r' = A d (r). We will show that ui,«i, . . . , Ufc, w/c 
is a canonical fc-nice cycle in G(£!)(t). Since proc(r{ui)) — proc(r(vi)) = i 
and < Vi, we have (ui,Vi) G M(r,i) for all 1 < i < We show that 
{Vi,UiQi) G fi e (r,z © 1) for all 1 < i < k. First loc{r{vi)) = Zoc(r(u ie i)) = 
For all u,v G L w (r,i), if A(z, data(r(u)) < A(i, data(r(v)) then u < v from the 
property of Constrain^). Since data(r' (wi©i)) G {1,2}, we have from the 
property of a renaming function that daia(r(iti0i)) 7^ 0. There are two cases 
on T'(vi). 

1. data(r' (vi)) = 0. There are two subcases: daia(r(w i )) = or X(i © 
1, data(r(vi))) = 0. In the first subcase, since data(r(ui(^i)) 7^ 0, we have 
G £! e (T, In the second subcase, there are a, 6 G L w (t, i(Bl) 

such that data(r{a)) = data(r{vi)) and data{r(b)) — rfaia(r(wi®i)). Since 
data(r' (a)) — and data(r' (b)) G {1,2}, we get from the definition of 
Constraink(i © 1) that a < b or (a, 6} G ^(t, i © 1). Therefore (i^, iti©i) G 

ft e (T,i©l). 
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2. op(r'(ui)) = and data(T' '(«»)) = 1. We have that op(r(^)) = W. 
There is an event 6 € L w (r,i © 1) such that data(r(b)) = data(r(uiQi)). 
There are two subcases: data(r' (u iS)1 )) = 1 or data(r' (u, ffi i)) — 2. In the 
first subcase, we have V{ = b since Constrain k(i © 1) accepts traces with a 
single write event labeled with 1. Therefore data(r(vi)) — data(T(Ui@i)), 
op(r(«i)) = W and op(r(u ie i)) = and we get (i>j,«i©i) € f2 e (r, i © 1). 
In the second subcase, since dato(r'(a)) = 1 and data{T'{b)) — 2, we get 
from the definition of Constraink(i © 1) that a < 6 or (a, 6) G 0(t, i © 1). 
Therefore (vi,u im ) e fi e (r, i © 1). 

Therefore U\,Vx, . . . , Uf., is a canonical &;-nice cycle in G(fi)(r). ■ 
Example. We now give an example to illustrate the method described in 
this section. Although the memory system in Figure [j] is sequentially consistent, 
an earlier version had an error. The assignment owner[j] := was missing in 
the guarded command of the action (ACKS ,i, j) . We modeled the system in 



TL A+ [ Lam94 ] and model checked the system configuration with two processors 



and two locations using the model checker TLC [ YML99 . The error manifests 
itself while checking for the existence of a canonical 2-nice cycle. The erroneous 
behavior is when the system starts in the initial state with all cache lines in SHD 
state and owner [1] = owner [2] — 1, and then executes the following sequence 
of 12 events: 

1. (ACKX, 2,2) 

2. (UPD,2) 

3. (ACKS, 1,2) 

4. (ACKX, 2,2) 

5. (ACKX, 1,1) 

6. (UPD,1) 

7. (UPD,1) 

8. (W, 1,1,1) 

9. (#,1,2,0) 

10. (UPD,2) 

11. (W,2,2,l) 

12. <i?,2,l,0) 

After event 2, owner[2] = 2, cache[l][2].s = INV , and cache[2][2].s = EXC. 
Now processor 1 gets a shared ack message (ACKS, 1,2) for location 2. Note 
that in the erroneous previous version of the example, this event does not set 
owner[2] to 0. Consequently owner[2] — 2 and cac/ie[2][2].s = SHD after 
event 3. An exclusive ack to processor 2 for location 2 is therefore allowed 
to happen at event 4. Since the shared ack message to processor 1 in event 3 is 
still sitting in mQ[l], cac/ie[l][2].s is still INV. Therefore event 4 does not gen- 
erate an INVAL message to processor 1 for location 2. At event 5, processor 1 
gets an exclusive ack message for location 1. This event also inserts an INVAL 
message on location 1 in mQ[2] behind the ACKX message on location 2. After 
the UPD events to processor 1 in events 6 and 7, we have cac/ie[l][l].s = EXC 
and coc/ie[l][2].s = SHD. Processor 1 writes 1 to location 1 and reads from 
location 2 in the next two events, thereby sending automaton C7jecfc2(l) to the 
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state err. Processor 2 now processes the ACKX message to location 2 in the 
UPD event 10. Note that processor 2 does not process the INVAL message to 
location 1 sitting in inQ[2]. At this point, we have cac/ie[2][l].s = SHD and 
cac/ie[2][2].s = EXC . Processor 2 writes 1 to location 2 and reads from lo- 
cation 1 in the next two events, thereby sending automaton Checks^) to the 
state err. Since there has been only one write event of data value 1 to each 
location, the run is accepted by Constrainzil) and Constrairi2(2) also. ■ 
Note that while checking for canonical fc-nice cycles Constrairik(j) has 2 
states for all 1 < j < k and 1 state for k < j < m. Also Checkk(i) has 3 states 
for all 1 < i < k. Therefore, by composing Constrairik(j) and Checkk(i) with 
the memory system S(n, m, 2) we increase the state of the system by a factor of 
at most 2 k x 3 fc . Actually, for all locations k < j < m we are restricting write 
events to have only the data value 1. Therefore, in practice we might reduce 
the set of reachable states. 



9 Related work 



Descriptions of shared-memory systems are parameterized by the number of 
processors, the number of memory locations, and the number of data values. 
The specification for such a system can be either an invariant or a shared- 
memory model. These specifications can be verified for some fixed values of the 
parameters or for arbitrary values of the parameters. The contribution of this 
paper is to provide a completely automatic method based on model checking to 
verify the sequential consistency memory model for fixed parameter values. We 
now describe the related work on verification of shared-memory systems along 
the two axes mentioned above. 

A number of papers have looked at invariant verification. Model checking 
has been used for fixed para meter values flM S9l| pGH+93|, |EM95|, |ID96|], while 



mechanical theorem proving [LD92, PD96] has been used for arbitrary param- 
eter values. Methods combining automatic abstraction with model checking 



[PD95, Dcl00 | have been used to veri fy snoop y cache-coherence protocols for 
arbitrary parameter values. McMillan | McM01 has used a combination of the- 
orem proving and model checking to verify the directory-based FLASH cache- 
coherence protocol | KOH + 94 for arbitrary parameter values. A limitation of 
all these approaches is that they do not explicate the formal connection between 
the verified invariants and shared-memory model for the protocol. 

There are some papers that have looked at verification of shared-memory 
models. Systematic manual proof methods [LLOR99, PSCH98| and theorem 
proving [AroOl] have been used to verify sequential consistency for arbitrary 
parameter values. These approaches require a significant amount of effort on 
the part of the verifier. Our method is completely automatic and is a good 
debugging technique which c an be ap plied before using the se met hods. The 
approach of Hcnzinger et al. HQR99] and Condon and Hu |CH01] requires a 



manually constructed finite state machine called the serializer. The serializer 
generates the witness total order for each run of the protocol. By model checking 
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the system composed of the protocol and the serializer, it can be easily checked 
that the witness total order for every run is a trace of serial memory. This idea is 
a particular instance of the more general "convenient computations" approach of 
Katz and Peled [KP92|. In general, the manual construction of the serializer can 
be tedious and infeasible in the case when unbounded storage is required. Our 
work is an improvement since the witness total order is deduced automatically 
from the simple write order. Moreover, the amount of state we add to the cache- 
coherence protocol in order to perform the model checking is significantly less 
than that added by t he serialize r approach. The "test model checking" approach 
of Nalumasu et al. [NGMG9S] can check a variety of memory models and is 
automatic. Their tests are sound but incomplete for sequential consistency. On 
the other hand, our method offers sound and complete verification for a large 

class of cache-coherence protocols. 

Recently Glusman and Katz GK01 | have shown that, in general, interpret- 
ing sequential consistency over finite traces is not equivalent to interpreting it 
over infinite traces. They have proposed conditions on shared-memory systems 
under which the two are equivalent. Their work is orthogonal to ours and a com- 
bination of the two will allow verification of sequential consistency over infinite 
traces for finite parameter values. 



10 Conclusions 

We now put the results of this paper in perspective. Assumption [l] about causal- 
ity and Assumption ^ about data independence are critical to our result that 
reduces the problem of verifying sequential consistency to model checking. As- 
sumption U about processor symmetry and Assumption ^ about location sym- 
metry are used to reduce the number of model checking lemmas to min({n, m}) 
rather than exponential in n and m. 

In this paper, the read and write events have been modeled as atomic events. 
In most real machines, each read or write event is broken into two separate events 
— a request from the processor to the cache, and a response from the cache to 
the processor. Any memory model including sequential consistency naturally 
specifies a partial order on the requests. If the memory system services processor 
requests in order then the order of requests is the same as the order of responses. 
In this case, the method described in this paper can be used by identifying the 
atomic read and write events with the responses. The case when the memory 
system services requests out of order is not handled by this paper. 

The model checking algorithm described in the paper is sound and complete 
with respect to a simple witness for the memory system. In some protocols, for 



example the lazy caching protocol [ABM93], the correct witness is not simple. 



But the basic method described in the paper where data values of writes are 
constrained by automata can still be used if ordering decisions about writes can 
be made before the written values are read. The lazy caching protocol has this 
property and extending the methods described in the paper to handle it is part 
of our future work. We would also like to extend our work to handle other 
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memory models. 
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